Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
نویسندگان
چکیده
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes.
منابع مشابه
A novel feature selection techniques based on contrast set mining
Data classification is a challenging task in era of big data due to high number of features. Feature selection is a step in process of knowledge discovery in data that aims to reduce dimensionality and improve the classification performance. The purpose of this research is to define new techniques for feature selection in order to improve classification accuracy and reduce the time required for...
متن کاملA Novel Tree Based Classification
Classification is a data mining (DM) technique used to predict or forecast the unknown information using the historical data. There are many classification techniques. ID3 is a very popular tree based classification algorithm for a categorical data which does not support continuous data. Attribute selection process plays major role in building a classification tree model. Attribute Selection in...
متن کاملBayesian Models to Assess Risk of Corruption of Federal Management Units
This paper presents a data mining project that generated Bayesian models to assess risk of corruption of federal management units. With thousands of extracted features related to corruptibility, the data were processed using techniques like correlation analysis and variance per class. We also compared two different discretization methods: Minimum Description Length Principle (MDLP) and Class-At...
متن کاملBenchmarking Relief-Based Feature Selection Methods
Modern data mining requires feature selection methods that can (1) be applied to large scale feature spaces, (2) function in noisy problems, (3) detect complex patterns of association (e.g. interactions), (4) be flexibly adapted to various problem domains and data types, and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms ins...
متن کاملHandling Sparse Data Sets by Applying Contrast Set Mining in Feature Selection
A data set is sparse if the number of samples in a data set is not sufficient to model the data accurately. Recent research emphasized interest in applying data mining and feature selection techniques to real world problems, many of which are characterized as sparse data sets. The purpose of this research is to define new techniques for feature selection in order to improve classification accur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Knowl. Data Eng.
دوره 15 شماره
صفحات -
تاریخ انتشار 2003